The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

نویسندگان

  • Ilya O. Ryzhov
  • Warren B. Powell
  • Peter I. Frazier
چکیده

We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the optimal policy in the classic bandit problem, and outperforms many learning policies in the correlated case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ADAPTIVE FUZZY TRACKING CONTROL FOR A CLASS OF PERTURBED NONLINEARLY PARAMETERIZED SYSTEMS USING MINIMAL LEARNING PARAMETERS ALGORITHM

In this paper, an adaptive fuzzy tracking control approach is proposed for a class of single-inputsingle-output (SISO) nonlinear systems in which the unknown continuous functions may be nonlinearlyparameterized. During the controller design procedure, the fuzzy logic systems (FLS) in Mamdani type are applied to approximate the unknown continuous functions, and then, based on the minimal learnin...

متن کامل

Designing stable neural identifier based on Lyapunov method

The stability of learning rate in neural network identifiers and controllers is one of the challenging issues which attracts great interest from researchers of neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of ...

متن کامل

Fuzzy adaptive tracking control for a class of nonlinearly parameterized systems with unknown control directions

This paper addresses the problem of adaptive fuzzy tracking control for aclass of nonlinearly parameterized systems with unknown control directions.In this paper, the nonlinearly parameterized functions are lumped into the unknown continuous functionswhich can be approximated by using the fuzzy logic systems (FLS) in Mamdani type. Then, the Nussbaum-type function is used to de...

متن کامل

The Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS

The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...

متن کامل

The Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS

The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Operations Research

دوره 60  شماره 

صفحات  -

تاریخ انتشار 2012